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INTRODUCTION 

This  proposal  aims  to  exploit  advances  in  biotechnology  and  informatics  to  develop  a  genetics 
resource  termed  the  Prostate  Expression  Database  (PEDB)  (http://www.pedb.org).  PEDB  is  an 
integrated  resource  focused  exclusively  on  prostate  cancer  that  incorporates  public  DNA  and 
protein  sequence  and  informatics  resources  where  applicable.  The  foundation  of  PEDB  is  the 
identification  and  characterization  of  a  prostate  transcriptome,  the  intermediary  between  the 
genome  and  the  proteome  that  represents  that  portion  of  the  human  genome  actively  used  or 
transcribed  in  the  prostate.  This  proposal  will  extend  PEDB  capabilities  by  accomplishing  the 
following  specific  objectives:  1)  assemble  and  annotate  a  working  prostate  transcriptome;  2) 
develop  a  suite  of  database  tools  to  facilitate  investigator-initiated  database  queries;  3)  extend  the 
prostate  transcriptome  in  3  dimensions:  acquiring  rare  transcripts,  assembling  sequences 
representing  full-length  genes,  and  mapping  the  locations  of  interesting  and  novel  prostate  genes; 
and  4)  assemble  a  solid-phase  nonredundant  archive  of  prostate-derived  cDNA  clones  for 
distribution  to  investigators  and  to  the  Image  Consortium  sites. 


BODY 

The  following  summarizes  the  technical  objectives  for  the  proposal  and  the  work  accomplished 
during  the  8-month  interval  between  the  last  report  (07/14/00)  and  the  preparation  of  this  report 
(03/14/01). 

Technical  objective  1:  To  assemble  and  annotate  a  working  prostate  transcriptome 
(months  1-16) 

•  Task  1:  Install  Phrap,  d2-cluster,  and  CAPS  software  and  test  on  small,  known 
genomic  sequence  assemblies  (months  1-3).  Completed.  Phrap  is  the  selected 
assembly  algorithm  of  choice. 

•  Task  2:  Assemble  UniGene  and  prostate  EST  test  sets.  Compare  with  previous 
assemblies  performed  with  CAP2.  Manually  review  assembly  discrepancies. 

Compare  assemblies  with  UniGene  and  CGAP  clusters  (months  1-6).  Completed. 

•  Task  3:  Assemble  and  annotate  all  PEDB  sequences  using  best  available  algorithm 
(months  6-12).  Completed.  We  have  completed  the  download  of  10,000  additional 
chromatograms  from  the  Washington  University  web  site  and  added  an  additional 
4,000  ESTs  from  our  own  sequencing  project.  The  assembly  of  these  ESTs  with 
phrap  has  been  completed  and  the  contigs  have  been  annotated  against  sequences 
housed  in  Genbank  and  Unigene.  The  latest  assembly  statistics  are  shown  in  Figures 
1  and  2  with  the  assembly  schema  and  results. 
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sequence  or  teat  strings  using  BLAST  elgoridims. 
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55  cDNA  libraries: 
•normal 
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•PIN 

•primary  CA 
•metastatic  CA 
•ceil  lines 
•xenografts 

▼ 

-84,460  ESTs 

Assembled 

Annotated 

Prostate  Transcriptome 


Figure  1.  (above)  WWW  Interface  for  the  Prostate  Expression  Database  (PEDB).  55  libraries  containing 
-85,000  EST  have  been  processed,  assembled,  and  annotated  to  comprise  a  Prostate  Transcriptome. 

Figure  2.  (below)  Assembly  process  for  PEDB  ESTs.  -85,000  ESTs  were  processed  for  low  quality, 
vector,  e  coli,  and  repetitive  sequences.  The  remaining  ESTs  were  assembled  using  Phrap  and  annotated 
against  sequences  in  the  public  nucleotide  databases.  Following  re-assembly,  a  prostate  transcriptome  of 
17,456  distinct  species  was  entered  into  PEDB. 


Results  of  PEDB  Assembly  and  Annotation 


“<100bp 

^Vector 

"E.coli 

“repeat 


84,460  ESTs 


- 1  Analdemon 

^  75,000  ESTs 

I  Phred/Phrap 

27,397  Clusters  (Species) 


Genbank 
10,981  + 
16.416_:^ 

dbEST 
13,369+ 
3,047  - 


Unigene 
13,121  + 
14,276  - 


Results  after  annotation  re-group: 
Total:  17,456  clusters/species 
{prostate  transcriptome) 


►>300bp  <50%N  =  2553 
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•  Task  4:  Develop  a  gene  classification  schema  based  upon  function,  and  automate 
assignments  of  clusters  to  functional  groups  (months  8-16).  We  have  completed  a 
functional  annotation  scheme  modeled  after  the  TIGR  annotation  scheme  for 
partitioning  genes  into  cellular  functional  groups.  This  has  been  applied  to  the 
LNCaP  dataset  and  is  available  for  viewing  and  analysis  on  the  PEDB  website. 

•  Task  5:  Develop  a  Graphical  User  Interface  for  viewing  and  navigating  between 
sequence,  Junctional  group,  and  expression  data  (months  3-12).  A  user  interface  has 
been  developed  for  the  viewing  of  sequence  chromatograms  and  for  searching  the 
database  with  keywords  in  addition  to  BLAST  queries. 

•  Task  6:  Write  scripts  to  automate  the  input  of  new  prostate  ESTs,  processing  of  new 
ESTs,  clustering  of  the  database  sequences  and  annotation  of  the  entire  cluster 
complement  on  a  monthly  basis,  (months  12-16).  Completed. 

Technical  objective  2:  To  develop  a  suite  of  database  tools  to  facilitate  investigator- 
initiated  database  queries,  (months  1-18).  In  progress. 

•  Task  7:  Evaluate  potential  sequence  cluster/assembly  viewing  tools:  DrawMap, 

Consed,  Phrapview,  CloneView,  and  AlignView  (months  6-12). 

•  Task  8:  Design  a  client  application  ( ContigView)  extending  the  functionality  of  the 
Virtual  Expression  Analysis  Tool  to  view  the  cluster  output  produced  by  the  best 
algorithm  as  identified  in  specific  aim  1.  (months  8-14) 

•  Task  9:  Design  GUI  to  support  high  level  viewing  of  clustered  data  with  graphical 
maps  incorporating  zoom  features  for  viewing  nucleotide  sequence  traces  and 
assemblies  (8-14).  A  tool  for  viewing  individual  sequence  traces  has  been  developed 
and  implemented  into  PEDB.  A  tool  for  viewing  sequence  assemblies  is  in  progress. 

•  Task  10:  Write  Java  code  for  ContigView  and  test  on  datasets  representing 
assemblies  of  few  and  many  ESTs  with  both  short  and  long  consensus  sequences. 

(months  10-24) 

•  Task  11:  Test  (and  modify  if  necessary)  ContigView  on  Windows/NT/MacIntoshAJnix 
operating  systems  ( months  24-30) 

•  Task  12:  Write  applications  to  link  cluster  consensus  to  relevant  public  databases 
(Genbank,  etc)  (months  20-24) 

•  Task  13:  Write  applications  for  integrating  gene  analysis  tools:  exon  prediction, 
promoter  finders,  transcription  factor  binding  site  ID,  protein  motif  ID  (months  18- 
28). 

•  Task  14:  Evaluate  the  incorporation  of  software  for  SNP  detection  (PolyPhred)  in 
client-selected  PEDB  clusters  (26-30). 

Technical  objective  3:  To  extend  the  prostate  transcriptome  in  3  dimensions:  1)  acquire  rare 
transcripts  2)  assemble  sequences  representing  full-length  genes  and  3)  map  the  location  to 
EST  clusters  to  specific  chromosomal  sites,  (months  12-25) 

•  Task  15:  construct  LNCaP  random  primed  library  and  CAP-finder  library  ( months  6-7). 
We  have  constructed  one  prostate  cDNA  library  from  androgen  stimulated  LNCaP  and  one 
cDNA  library  from  androgen-starved  LNCaP.  A  total  of  3,000  ESTs  have  now  been 
generated  from  each  library  to  date.  EST  assemblies  from  these  libraries  have  been  used  to 


7 


Nelson-DOD-Progress  Report  04/01 


virtually  determine  the  gene  expression  network  regulated  by  androgenic  hormones.  These 
datasets  have  been  compared  to  profiles  produced  by  the  Serial  Analysis  of  Gene  Expression 
(see  Clegg  et  al  in  reportable  outcomes).  Several  genes  previously  not  recognized  to  be  under 
androgen  regulation  were  identified. 

•  Task  16:  partially  sequence  1,600  cDNAs  from  each  library  and  enter  ESTs  into  PEDB. 
(months  8-12)  See  above.  All  6,000  ESTs  have  been  entered  into  PEDB,  assembled  using 
phrap,  and  annotated  against  sequences  present  in  the  public  nucleotide  databases. 

•  Task  1 7:  as  above  with  normal  prostate  tissue  (months  13-18).  We  have  constructed  cDNA 
libraries  from  microdissected  luminal  cell,  basal  cell,  and  stromal  tissue.  A  total  of  1,500 
ESTs  have  been  produced  from  these  libraries. 

•  Task  18:  as  above  with  microdissected  primary  prostate  cancer  tissue  ( months  25-30) 

•  Task  19:  "Negative  Select"  10,000  cDNAsfrom  normal  prostate  cDNA  array  (months  19-20) 

•  Task  20:  partially  sequence  10,000  negatively  selected,  low  abundance  cDNAs  and  submit 
ESTs  into  PEDB  (months  21-25) 

•  Task  21:  Identify  60  interesting  uncharacterized  prostate  ESTs/cDNAs  based  upon  a) 
homology  to  known  physiologically  important  genes  orb)  novelty,  to  directly  obtain  full- 
length  cDNA  sequence  using  RACE,  library  screening,  genomic  assembly,  and  primer- 
directed  sequencing.  A  total  of  15  full-length  cDNAs  per  year  will  be  obtained  (ongoing 
throughout  period  of  award).  We  have  cloned  and  sequenced  10  full-length  genes  expressed 
in  prostate.  One  of  these.  Prostate  Short-Chain  Dehydrogenase  Reductase  1  (PSDRl)  has 
been  extensively  characterized  and  a  manuscript  published  in  Cancer  Research  (see  Lin  et  al 
in  reportable  outcomes).  The  remaining  genes  are  under  further  evaluation. 

•  Task  22:  Map  interesting  prostate  cDNAs  described  above  using  radiation  hybrid  panel 
mapping,  (ongoing  throughout  period  of  award).  We  have  now  mapped  8  novel  prostate 
genes  using  radiation  hybrid  panel  mapping.  Future  work  will  automate  mapping  procedures 
using  the  available  human  genome  sequence  annotation. 

•  Task  23:  submit  data  to  PEDB  and  public  databases  (ongoing  throughout  period  of  award). 

Technical  objective  4:  To  assemble  a  solid  phase  nonredundant  archive  of  prostate-derived 

cDNA  clones. 

•  Task  24:  identify  a  cohort  consisting  of 3,000  distinct,  unique  prostate  clusters  from  year  1 
PEDB  assembly  (month  10).  We  have  assembled  a  non-redundant  set  of  6,000  cDNAs  from 
LNCaP,  normal  and  neoplastic  prostate  cDNA  libraries.  These  have  now  been  re-arrayed  into 
96-well  and  384-well  microtiter  plates.  The  clone  set  has  been  replicated.  PCR  amplification 
has  been  performed. 

•  Task  25:  cross-reference  cluster  sequences  with  PEDB  clone  archive  to  determine  the  clones 
physically  available  for  biological  studies  ( month  11).  Completed. 

•  Task  26  determine  the  longest  physical  clone  for  each  cluster  and  consolidate  bacterial 
transformants  into  96-well  plates  using  the  Genetix  Q-bot.  Preserve  for  storage  (months  12- 
13).  Completed. 

•  Task  27:  annotate  and  ship  to  IMAGE  consortium  clone  distributors  ( month  14).  In  progress. 

•  Task  28:  repeat  Tasks  24-27 for  3,000  additional  unique  clusters  at  the  end  of  month  24.  In 
progress. 

•  Task  29:  repeat  Tasks  24-27 for  3,000  additional  unique  clusters  at  the  end  of  month  35. 
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•  Task  30:  plan  for  incorporation  and  integration  ofPEDB  with  microarray  data  and 
proteomics  data  (months  24-36).  Currently  in  planning  stages.  We  have  obtained  database 
software  for  archiving  and  analyzing  microarray  data  from  Stanford  University.  We  are 
currently  testing  for  compatibility  with  PEDB. 

•  Task  31:  analyze/compile  data  and  prepare  formal  report  (month  36). 


KEY  RESEARCH  ACCOMPLISHMENTS 

•  Selected  phrap  as  the  sequence  assembly  algorithm  for  PEDB.  Assembled  and  annotated 
85,000  PEDB  ESTs. 

•  Constructed  cDNA  libraries  from  microdissected  luminal  epithelial  cells,  basal  epithelial 
cells,  and  stromal  cells. 

•  Sequenced  3,000  cDNAs  from  LNCaP,  luminal  cell,  basal  cell  and  stromal  cell  cDNA 
libraries  (total  4,000  ESTs)  and  assembled  the  ESTs  into  clusters/contigs.  The  data  indicate 
the  libraries  are  of  good  quality  with  significant  diversity. 

•  Virtual  comparison  of  the  LNCaP  libraries  identified  3  additional  new  androgen-regulated 
genes.  Northern  analysis  confirmed  androgen-regulation  for  these  genes. 

•  Constructed  a  cDNA  library  of  prostate  small  cell  carcinoma.  3,000  cDNA  clones  have  been 
sequenced  and  deposited  into  PEDB. 

•  Compiled  a  non-redundant  virtual  and  physical  archive  of  prostate  ESTs/cDNAs  comprising 
6,000  distinct  species.  These  clones  have  been  consolidated,  replicated,  and  arrayed  for 
cDNA  microarray  analysis. 

•  Identified  a  novel  gene  with  prostate  expression  specificity,  PSDRl,  that  is  hypothesized  to 
function  in  steroid  metabolism.  PSDRl  is  highly  expressed  in  primary  and  metastatic 
prostate  carcinoma. 
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CONCLUSIONS 

The  research  accomplished  to  date  has  assembled  a  working  virtual  prostate  transcriptome  that 
defines  the  genes  used  or  transcribed  in  prostate  cell  types  and  tissues.  This  transcriptome  has  a 
physical  counterpart  of  6,000  cDNAs  arrayed  in  cDNA  microarray  format  for  large-scale 
expression  studies.  This  transcriptome  has  been  used  as  a  foundation  for  studies  of  the  prostate 
proteome,  the  working  counterpart  to  the  genome  and  transcriptome.  Our  results  show  that  these 
approaches  are  complementary.  Analysis  of  the  virtual  transcriptome  of  LNCaP  cells  has 
identified  13  new  androgen-regulated  genes  to  date.  Characterization  of  these  genes  is  in 
progress.  One  gene,  PSDRl,  exhibits  sequence  homology  with  a  family  of  proteins  involved  in 
steroid  hormone  metabolism,  and  may  modulate  steroid  activity  within  the  prostate. 
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Comprehensive  analyses  of  prostate  gene 
expression:  Convergence  of  expressed  sequence 
tag  databases,  transcript  profiling  and  proteomics 

Several  methods  have  been  developed  for  the  comprehensive  analysis  of  gene 
expression  In  complex  biological  systems.  Generally  these  procedures  assess  either  a 
portion  of  the  cellular  transcriptome  or  a  portion  of  the  ceiluiar  proteome.  Each 
approach  has  distinct  conceptual  and  methodological  advantages  and  disadvantages. 
We  have  Investigated  the  application  of  both  methods  to  characterize  the  gene  expres¬ 
sion  pathway  mediated  by  androgens  and  the  androgen  receptor  in  prostate  cancer 
cells.  This  pathway  Is  of  critical  importance  for  the  development  and  progression  of 
prostate  cancer.  Of  clinical  importance,  modulation  of  androgens  remains  the  mainstay 
of  treatment  for  patients  with  advanced  disease.  To  facilitate  global  gene  expression 
studies  we  have  first  sought  to  define  the  prostate  transcriptome  by  assembling  and 
annotating  prostate-derived  expressed  sequence  tags  (ESTs).  A  total  of  55  000  pros¬ 
tate  ESTs  were  assembled  into  a  set  of  15  953  clusters  putatively  representing  15  953 
distinct  transcripts.  These  clusters  were  used  to  construct  cDNA  microarrays  suitable 
for  examining  the  androgen-response  pathway  at  the  level  of  transcription.  The  expres¬ 
sion  of  20  genes  was  found  to  be  induced  by  androgens.  This  cohort  included  known 
androgen-regulated  genes  such  as  prostate-specific  antigen  (PSA)  and  several  novel 
complementary  DNAs  (cDNAs).  Protein  expression  profiles  of  androgen-stimulated 
prostate  cancer  cells  were  generated  by  two-dimensional  electrophoresis  (2-DE). 
Mass  spectrometric  analysis  of  androgen-regulated  proteins  in  these  cells  identified 
the  metastasis-suppressor  gene  NDKA/nm23,  a  finding  that  may  explain  a  marked 
reduction  in  metastatic  potential  when  these  cells  express  a  functional  androgen  recep¬ 
tor  pathway. 

Keywords:  Prostate  /  T ranscriptome  /  Proteome  /  Androgen  /  Microarray  EL  3957 


1  introduction 

The  development  and  subsequent  progression  of  human 
prostate  carcinoma  is  propelled  by  the  accumulation  of 
genetic  alterations  and  influenced  by  environmental  fac¬ 
tors.  One  pivotal  mediator  of  prostate  carcinogenesis  Is 
the  androgen  receptor  (AR)  pathway.  The  majority  of 
prostate  cancers  initially  require  androgens  for  growth, 
and  the  elimination  of  AR-llgands  by  surgical  or  chemical 
castration  leads  to  marked  tumor  regression  through  a 
mechanism  of  programmed  cell  death  [1].  The  manipula¬ 
tion  of  the  AR  pathway  has  been  used  In  clinical  medicine 
since  the  1940s  as  the  primary  treatment  of  advanced 
prostate  cancer.  However,  this  therapy  is  palliative  and 
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eliminates  the  potential  beneficial  effects  of  androgen- 
induced  cellular  differentiation.  Surviving  cancer  cells  lose 
their  dependence  on  androgens  over  time  and  are  capa¬ 
ble  of  prolifertlon  in  the  absence  of  serum  androgens.  The 
molecular  events  leading  to  androgen  independence  (Al) 
have  not  been  defined,  but  potential  mechanisms  include 
overexpression  of  the  AR,  mutations  in  the  AR  gene  lead¬ 
ing  to  promiscuous  ligand  binding,  and  the  activation  of 
the  AR  or  downstream  regulatory  molecules  by  other 
endocrine  or  paracrine  growth  factors  [2, 3]. 

Until  recently,  biological  Investigations  have  almost 
entirely  focused  on  the  study  of  individual  genes  and  pro¬ 
teins.  This  has  partly  been  due  to  the  submicroscopic 
nature  and  transient  existence  of  relevant  molecules, 
combined  with  a  lack  of  quantitative  technology  capable 
of  providing  accurate  comprehensive  views  of  biological 
complexity.  Significant  advances  have  been  achieved 
studying  individual  genes,  proteins  and  small  numbers  of 
molecular  interactions.  However,  conclusions  made  on 
the  basis  of  the  study  of  an  individual  gene  may  have  lim¬ 
ited  relevance  as  to  how  the  gene  and  gene  product  func¬ 
tion  in  the  context  of  the  whole  cell,  tissue,  or  organism. 
Progress  in  understanding  complex  molecular  processes 
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has  been  hampered  by  the  lack  of  a  complete  inventory 
or  "tool-sef  of  all  genes  and  their  cognate  proteins  that 
are  necessary  for  defining  phenotypes  of  normal  and 
pathological  cellular  states. 

The  completion  of  the  Human  Genome  Project  will  pro¬ 
vide  a  foundation  for  a  thorough  description  of  this  molec¬ 
ular  inventory.  More  specifically,  the  gene  inventory  or 
tool  set  required  for  studies  of  prostate  carcinogenesis  is 
that  portion  of  the  human  genome  used  or  expressed  in 
the  human  prostate  gland.  The  subset  of  genes  trans¬ 
cribed  or  expressed  in  a  given  cell  or  tissue  type  such  as 
the  prostate  may  be  defined  as  the  “transcriptome”,  the 
dynamic  link  between  the  genome,  the  proteome,  and  the 
cellular  phenotype  associated  with  physical  characteris¬ 
tics  [4].  Once  a  transcriptome  has  been  described,  the 
next  objective  Is  to  understand  the  relationships  of  the 
genes  and  their  protein  products  in  terms  of  a  complex 
system,  e.g.,  biological  pathways  and  networks,  that  may 
define  health  and  disease.  With  this  aim,  novel  technolo¬ 
gies  for  comprehensively  assessing  genomes  and  pat¬ 
terns  of  gene  expression  have  recently  been  developed. 

Our  initial  efforts  have  focused  on  defining  the  prostate 
transcriptome  through  the  production  and  assembly  of 
expressed  sequence  tags  (ESTs)  derived  from  prostate 
complementary  DNA  (cDNA)  libraries  representing  a  wide 
sprectrum  of  normal  and  neoplastic  states.  These  EST 
assemblies  have  been  used  to  construct  cDNA  mlcroar- 
rays  suitable  for  interrogating  the  transcriptome  In  experi¬ 
ments  designed  to  examine  specific  biological  pathways 
that  may  be  involved  in  prostate  carcinogenesis.  The  mo¬ 
lecular  pathway  mediating  androgenic  hormone  action  on 
prostate  cells  is  a  specific  focus  of  our  work.  The  func¬ 
tional  architecture  of  prostate  gene  networks  is  furth  eluci¬ 
dated  by  our  next  level  of  analysis  that  Incorporates  stud¬ 
ies  of  the  prostate  proteome.  Analysis  of  the  transcrip¬ 
tome  facilitates  proteome  studies  by  providing  a  compre¬ 
hensive  prostate  sequence  database  for  Identifying  and 
annotating  known  and  unknown  proteins  displayed  by 
two-dimensional  gel  electrophoresis  (2-DE)  and  analyzed 
by  mass  spectrometry  (MS).  Our  objectives  for  delineat¬ 
ing  the  molecular  network{s)  influenced  by  AR  activation 
are  to  identify  specific  targets  that  promote  the  differentia¬ 
tion  and  apoptotic  potential  of  prostate  cancer  cells  while 
reducing  their  ability  to  proliferate. 

2  Materials  and  methods 

2.1  Assembly  of  a  prostate  transcriptome: 

Prostate  Expression  Database  (PEDB) 

A  prostate  transcriptome  was  assembled  from  ESTs  de¬ 
rived  from  cDNA  libraries  representing  a  wide  sprectrum 


of  normal,  benign,  and  malignant  prostate  tissues.  A 
detailed  description  of  the  assembly  and  annotation  pro¬ 
cedure  is  described  elsewhere  [5].  Briefly,  individual 
ESTs,  detailed  cDNA  library  information,  and  sequence 
annotations  were  loaded  into  a  relational  database  (Ora¬ 
cle  Corp.)  termed  the  Prostate  Expression  Database 
(PEDB),  Prostate  ESTs  used  for  the  assembly  were  gen¬ 
erated  In  our  laboratory  as  previously  described  [6].  Addi¬ 
tional  public  domain  ESTs  of  prostate  origin  were  ob¬ 
tained  from  Genbank  (http://www.ncbi.nlm.njh.gov/Entrez/ 
batch.html),  the  NCI  Cancer  Genome  Anatomy  Project 
(CGAP)  [7],  and  The  Institute  for  Genome  Research 
(TIGR)  (http://www.tigr.org).  Each  EST  was  examined  for 
sequence  homology  to  cloning  vectors,  Escherichia  coli, 
and  repetitive  DNA  sequences  using  a  core  program 
called  AnalDemon  (http://www.mbt.washington.edu/PE 
DB/software).  AnalDemon  first  employs  Cross_Match 
(http://bozeman.mbt.washington.edu/phrap.docs/general. 
html);  a  program  based  on  the  Smith-Waterman-Gotoh 
algorithm,  to  screen  for  vector  and  E.  coli  genome  con¬ 
tamination.  ESTs  are  then  examined  for  interspersed 
repeats  and  regions  of  low  sequence  complexity  using 
Repeatmasker  (http://ftp.genome.washington.edu/RM/Re 
peatMasker.html).  Specific  portions  of  EST  sequences 
exhibiting  homology  to  any  of  these  unwanted  elements 
are  masked  in  order  to  eliminate  the  sequence  from  con¬ 
tributing  to  an  assembly  process.  CAP2  [8],  a  multiple 
alignment  program  based  on  a  variant  of  the  Smith  and 
Waterman  algorithm,  was  used  for  assembling  ESTs  into 
homologous  groups  or  clusters.  Clustering  is  based  on 
maximal  scoring  of  overlapping  alignments  and  allows  for 
general  substitutions  resulting  from  sequencing  errors, 
insertions,  and  deletions.  CAP2  produces  a  consensus 
sequence  and  allows  varying  sensitivity  and  overlap 
parameters.  Each  group  or  cluster  of  ESTs  exhibiting  sig¬ 
nificant  homology  with  one  another  is  termed  a  species. 
Thus,  a  species  Is  a  sequence  or  group  of  sequences  that 
is  unique  relative  to  the  nucleotide  sequence  of  other 
groups  of  sequences,  and  each  is  given  a  unique  PEDB 
Species  Identification  number  (SID).  The  SID  provides  a 
means  to  analyze  gene  expression  across  the  entire  set 
of  assemblies,  and  can  be  used  to  provide  a  library-by- 
llbrary  species-specific  differential  expression  profile. 
Each  distinct  species  from  the  clustering  process  was 
annotated  by  searching  the  Unigene  (ncbi.nlm.nih.gov  in  / 
pub/schuler/unigene),  Genbank  (ncbi.nlm.nih.gov/blast/ 
db/nt.Z),  and  EST  databases  (ncbi.nlm.nlh.gov/blast/db/ 
est.Z)  using  BLASTN  (http://blast.wustl.edu).  Annotations 
were  assigned  automatically  using  the  program  Smart- 
Blast  (http://www.mbt.washington.edu/PEDB/software)  by 
selecting  the  database  match  with  the  lowest  p  value  and 
the  highest  blast  score  where  the  maximum  p  value  is 
6”^°  and  the  minimum  blast  score  is  500. 


12 


Electrophoresis  2000, 21, 1823-1831 

2.2  Prostate  transcriptome  analyses  by  cDNA 
microarray 

2.2.1  Microarray  fabrication 

A  nonredundant  set  of  1500  prostate-derived  cDNA 
clones  was  identified  from  the  prostate  transcriptome 
archived  in  PEDB.  Individual  clone  inserts  were  amplified 
by  the  PCR  using  2  pL  of  bacterial  transformant  culture 
as  template  with  primers  BL_m13F  (5'-GTAAAACGA- 
CGGCCAGTGAATTG-3')  and  BL_m13R  (5'-ACACAGG- 
AAACAGCTATGACCATG-3'  as  previously  described  [6]. 
PCR  products  were  purified  through  Sephacryl  S500 
(Amersham  Pharmacia  Biotech,  Uppsala,  Sweden), 
mixed  1:1  with  dimethylsulfoxide,  and  spotted  in  duplicate 
onto  coated  Type  VII  glass  microscope  slides  (Amersham 
Pharmacia  Biotech)  using  a  Molecular  Dynamics  (Sunny¬ 
vale,  CA,  USA)  Genii  robotic  spotting  tool.  After  spotting, 
the  glass  slides  were  air-dried  and  UV-cross-linked  with 
500  mJ  of  energy  and  then  baked  at  95°C  for  30  min. 

2.2.2  Probe  construction  and  microarray 
hybridization 

Total  RNA  was  isolated  from  the  androgen-responsive 
LNCaP  prostate  cancer  cells  [9]  at  time  points  of  0,  4,  8, 
24,  and  72  h  after  androgen  depletion  or  supplementation 
using  TRIzol  (Life  Technologies,  Paisley,  UK)  according 
to  the  manufacturer's  directions.  Fluorescence-labeled 
probes  were  made  from  30  pg  of  total  RNA  in  a  reaction 
volume  of  20  pL  containing  1  pL  anchored  oligo-dT  primer 
(Amersham  Pharmacia  Biotech),  0.05  mw  Cy3-dCTP 
(Amersham  Pharmacia  Biotech),  0.05  mM  dCTP,  0.1  mw 
each  dGTP,  dATP,  dTTP,  and  200  U  Superscript  II 
reverse  transcriptase  (Life  Technologies).  Reactants 
were  incubated  at  42°C  for  120  min  followed  by  heating  to 
94°C  for  3  min.  Unlabeled  RNA  was  hydrolyzed  by  the 
addition  of  1  pL  of  5  n  NaOH  and  heating  to  37°C  for 
10  min.  One  pL  of  5  m  HCI  and  5pL  of  1  m  Tris-HCI,' 
pH  7.5,  were  added  to  neutralize  the  base.  Unincorpo¬ 
rated  nucleotides  and  salts  were  removed  by  chromatog¬ 
raphy  (Qiagen,  Chatsworth,  CA,  USA),  and  the  cDNA 
was  eluted  in  30  pL  dH20.  One  pg  of  dA/dT  12-18 
(Amersham  Pharmacia  Biotech)  and  1  pg  of  human  Cotl 
DNA  (Life  Technologies)  were  added  to  the  probe,  heat- 
denatured  at  94°C  for  5  min,  combined  with  an  equal  vol¬ 
ume  of  2  X  microarray  hybridization  solution  (Amersham 
Pharmacia  Biotech)  and  prehybridized  at  50°C  for  1  h. 
The  mixture  was  then  placed  onto  a  microarray  slide  with 
a  coverslip  and  hybridized  in  a  humid  chamber  at  52°C 
for  16  h.  The  slides  were  washed  once  with  1  x  sodium 
chloride  and  sodium  citrate  (SSC),  0.2%  SDS  at  room 
temperature  for  5  min  and  then  twice  with  0.1  x  SSC, 
0.2%  SDS  at  room  temperature  for  1 0  min.  After  washing, 
the  slide  was  rinsed  in  distilled  water  to  remove  trace  salts 
and  dried. 
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2.2.3  Image  acquisition  and  data  analyses 

Fluorescence  intensities  of  the  immobilized  targets  were 
measured  using  a  laser  confocal  microscope  (Molecular 
Dynamics).  Intensity  data  were  integrated  at  a  pixel  reso¬ 
lution  of  10  pm  using  approximately  20  pixels  per  spot, 
and  recorded  at  16  bits.  Quantitative  data  were  obtained 
with  the  SpotFinder  Version  2.4  program  written  at  the 
University  of  Washington.  Local  background  hybridization 
signals  were  subtracted  prior  to  comparing  spot  intensi¬ 
ties  and  dtermining  expression  ratios.  For  each  experi¬ 
ment,  each  cDNA  was  represented  twice  on  each  slide, 
and  the  experiments  were  performed  in  duplicate  produc¬ 
ing  four  data  points  per  cDNA  clone  and  hybridization 
probe.  Intensity  ratios  for  each  cDNA  clone  hybridized 
with  probes  derived  from  androgen-stimulated  LNCaP 
and  androgen-starved  LNCaP  cells  were  calculated 
(stimulated  intensity/starved  intensity).  Gene  expression 
levels  were  considered  significantly  different  between  the 
two  conditions  if  all  four  replicate  spots  for  a  given  cDNA 
demonstrated  a  ratio  >  2  or  <  0.5,  and  the  signal  intensity 
was  greater  than  two  standard  deviations  above  the 
image  background.  We  have  previously  determined  that 
expression  ratios  less  than  1 .5  are  not  reproducible  in  our 
system  (datas  not  shown). 

2.3  Prostate  proteome  analyses  by  2-DE  and 
MS 

2.3.1  2-DE 

LNCaP  prostate  cancer  cells  were  grown  under  condi¬ 
tions  of  androgen  stimulation  or  androgen  starvation  as 
described  above.  M12AR  cells,  a  highly  metastatic  pros¬ 
tate  cancer  cell  line  derived  from  the  serial  passaging  of 
SV40  immortalized  prostate  epithelial  cells  [1 0]  and  trans¬ 
fected  with  the  AR  were  grown  in  serum-free  DMEM  high- 
glucose  media  (Life  Technologies)  supplemented  with 
insulin,  transferrin,  selenium,  and  dexamethasone  as  pre¬ 
viously  described  [1 1  ].  Cells  were  allowed  to  reach  80% 
confluency  and  then  treated  for  24  h  with  the  same  media 
supplemented  with  10  nM  R1881.  Cells  were  washed 
once  with  PBS,  scraped  from  plates  with  a  rubber  police¬ 
man  and  pelleted  by  centrifugation.  Protein  was  har¬ 
vested  as  described  by  Garrels  and  Franza  [12].  Briefly, 
cell  pellets  were  lysed  in  a  buffer  containing  0.3%  SDS, 
1%  p-mercaptoethanol,  and  50  mw  Tris-HCI,  pH  8.0, 
100  (ig/mL  DNAase  I,  50  pg/mL  RNAase  A,  5  mw  MgCIa, 
and  heated  for  1  min  at  100°C.  Harvested  protein  was 
lyophilized,  resuspended  in  isoelectric  focusing  (lEF)  gel 
rehydration  solution,  and  stored  at  -80°C.  Soluble  pro¬ 
teins  were  run  in  the  first  dimension  by  using  a  commer¬ 
cial  flatbed  electrophoresis  system  (Multiphor  It;  Amer¬ 
sham  Pharmacia  Biotech).  Nonlinear  immobilized  pH 
gradient  (IPG)  dry  strips  ranging  from  3.0  to  10.0  (Amer- 
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sham  Pharmacia  Biotech)  were  used  for  the  first-dimen¬ 
sional  separation.  Forty  micrograms  of  protein  from 
whole-cell  lysates  were  mixed  with  IPG  strip  rehydration 
buffer  (8  m  urea,  2%  Nonidet  P-40,  10  mw  dithiothreitol), 
and  250-380  pL  of  solution  (13  and  18  cm  IPGs,  respec¬ 
tively)  was  added  to  individual  lanes  of  an  IPG  strip  rehy¬ 
dration  tray  (Amersham  Pharmacia  Biotech).  The  strips 
were  rehydrated  at  room  temperature  for  1  h.  The  sam¬ 
ples  were  run  at  300  V,  10  mA,  5  W  tor  2  h,  ramped  to 
3500  V,  10  mA,  5  W  over  a  period  of  3  h,  and  then  kept  at 
3500  V,  10  mA,  5W  for  15-19  h.  Following  lEF  (60- 
70  kVh),  the  IPG  strips  were  first  reequilibrated  for  8  min 
in  a  solution  of  2%  w/v  dithiothreitol,  2%  w/v  SDS,  6  m 
urea,  30%  w/v  glycerol,  0.05  m  Tris-HCI  (pH  6.8)  and  sub¬ 
sequently  for  4  min  in  a  solution  of  2.5%  w/v  lodoaceta- 
mide,  2%  w/v  SDS,  6  m  urea,  30%  w/v  glycerol,  0.05  m 
Tris-HCI  (pH  6.8)  with  a  trace  of  bromophenol  blue  added 
for  color.  Following  reequilibration,  the  strips  were  trans¬ 
ferred  and  apposed  to  10%  polyacrylamide  second¬ 
dimensional  gels.  Polyacrylamide  gels  were  poured  in 
casting  stand  with  10%  acrylamide-2.67%  piperazine  di¬ 
acrylamide-0.375  m  Tris,  pH  8.8,  0.1%  w/v  SDS,  0.05% 
wAr  ammonium  persulfate,  0.05%  TEMED  (N,N,N',N'- 
tetramethylethylenediamine)  in  Milli-Q  water  (Millipore, 
Bedford,  MA,  USA).  Second-dimensional  gels  (0.1  X20X 
20  cm)  were  run  in  an  apparatus  supplied  by  Oxford  Gly- 
cosciences  (Abington,  UK).  Once  the  IPG  strips  were 
apposed  to  the  second-dimensional  gels,  they  were 
immediately  run  at  a  constant  current  of  50  mA  at  500  V 
and  85  W  for  20  min,  followed  by  a  constant  current  of 
200  mA  at  500  V  and  85  W  until  the  buffer  front  was 
10-15  mm  from  the  bottom  of  the  gel.  Gels  were  removed 


and  silver  stained  according  to  the  procedure  of  Blum 
efa/.[13]. 

2.3.2  Protein  identification  by  tandem  mass 
spectrometry 

Protein  spots  from  gels  were  identified  by  tandem  mass 
spectrometry  (MS/MS)  as  previously  described  [14]. 
Spots  from  silver-stained  gels  were  excised  and  in-gel 
tryptic  peptides  were  separated  by  microcapillary  LC 
((iLC)  coupled  to  a  tandem  mass  spectrometer 
(TSQ  7000;  Finnigan,  San  Jose,  CA).  Peptide  fragmenta¬ 
tion  spectra  were  generated  in  a  data-dependent  fashion. 
Spectra  were  searched  against  the  composite  OWL  pro¬ 
tein  sequence  database  by  using  the  computer  program 
SEQUEST  [15]  and  against  the  PEDB.  A  protein  match- 
was  determined  by  comparing  the  number  of  peptides 
identified  and  their  respective  cross-correlation  scores. 
Protein  identifications  were  verified  by  comparison  with 
theoretical  molecular  weights  and  isoelectric  points. 

3  Results  and  discussion 

3.1  Prostate  gene  expression  analyses:  EST 
assemblies  and  annotation 

ESTs  produced  from  cDNA  libraries  derived  from  normal 
and  neoplastic  human  prostate  tissue  samples  were 
entered  into  the  PEDB,  an  Oracle  relational  database  run¬ 
ning  on  a  Sun  SPARC  workstation.  The  most  recent 
PEDB  build  was  assembled  starting  with  55  000  prostate 
ESTs  produced  from  42  cDNA  libraries.  Portions  of  EST 
sequences  with  homology  to  cloning  vector,  £  coll 
genomic  DNA,  and  human  repetitive  DNA  sequences 
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Figure  1.  Assembly  of  a  pros¬ 
tate  transcriptome.  (A)  55  000 
prostate  ESTs  were  examined 
for  “junk”  sequences  leaving 
49  816  high  quality  ESTs  suita¬ 
ble  for  assembly.  Clustering  the 
ESTs  Into  groups  of  high  homol¬ 
ogy  produced  a  set  of  21  114 
clusters  that  were  annotated 
against  nucleotide  and  protein 
sequences  in  the  public  se¬ 
quence  databases.  Clusters 
exhibiting  homology  to  Genbank 
sequences  were  also  examined 
for  homology  to  Unigene  se¬ 
quences  (UG)  to  further  col¬ 
lapse  clusters  into  homologous 
groups.  (B)  Following  clustering, 
database  annotations  and  reclustering,  a  total  of  1 5  953  distinct  prostate  EST  species  were  identified.  More  than  2000 
prostate  species  did  not  have  homology  to  nonprostate-derived  sequences  in  the  public  databases  (unannotated). 


Genbank 
5840-^  8715-1- 
12399  - 


UG 


1 

dbEST 
10387  + 
2012- 


Uniaene 

14435  +-+(9637) 
6679-  ^ - - 


Total: 

15.953  distinct  clusters/species 


14 


Electrophoresis  2000, 21, 1823-1831 


Comprehensive  analyses  of  prostate  gene  expression  1 827 


\«ere  masked  and  ESTs  with  >  100  bp  of  high  quality 
sequence  were  admitted  to  the  assembly  process 
(Fig.  1A).  A  total  of  49  816  high  quality  ESTs  were 
assembled  using  the  sequence  assembly  program  CAP2 
to  produce  21  1 1 4  clusters.  Each  cluster  was  annotated 
by  searching  the  Unigene,  Genbank,  and  dbEST  data¬ 
bases  with  the  CAP2-generated  cluster  consensus 
sequences  using  BLASTN.  Clusters  annotating  to  the 
same  database  sequence  were  joined  to  further  reduce 
the  number  of  distinct  clusters  to  1 5  953  (Fig.  1 B). 

Studies  in  the  1 970s  using  reassociation  kinetics  to  esti¬ 
mate  the  number  of  different  transcripts  indicate  that  be¬ 
tween  10  000  and  30  000  distinct  mRNAs  are  present  in 
mammalian  cells  or  organs  [16,  17].  Recent  data  pro¬ 
duced  using  the  method  of  Serial  Analysis  of  Gene 
Expression  (SAGE)  suport  these  estimates  of  transcript 
diversity  in  mammalian  epithelial  cells  with  estimates  of 
14  000-20  000  different  mRNAs  per  cell  [18].  Although 
the  identification  of  alternatively  spliced  transcripts  and 


highly  homolgous  gene  family  members  may  increase  or 
decrease  these  estimates  slightly,  they  nevertheless  pro¬ 
vide  a  rough  estimate  of  the  complexity  of  cellular  gene 
activity.  Based  upon  these  data,  the  15  953  prostate  EST 
clusters  that  we  have  assembled  should  characterize 
roughly  50-75%  of  the  prostate  transcriptome.  It  is  likely 
that  this  assembled  dataset  comprises  all  of  the  abundant 
and  most  of  the  moderately  abundant  prostate  transcripts 
[6].  Ongoing  work  involves  the  acquisition  of  the  remain¬ 
ing  low  abundance  transcripts.  Approaches  to  achieving 
this  goal  involve  the  construction  of  cDNA  libraries  from 
highly  selected  purified  cell  populations  such  as  luminal 
epithelial  and  neuroendocrine  cells,  and  from  prostate  tis¬ 
sues  at  different  stages  of  development  {e.g.,  fetal  pros¬ 
tate)  or  under  different  hormonal  influences  {e.g.,  andro¬ 
gen  stimulation).  Another  useful  strategy  involves  the 
iterative  removal  of  abundant  and  previously  identified 
cDNAs  in  order  to  select  for  rare  species.  A  high-through- 
put  method  using  cDNA  array-based  technology  has 
been  developed  to  facilitate  this  process  [19]. 
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Figure  2.  cDNA  microarray  analysis  of  prostate  androgen-regulated  gene  expression.  A  nonredundant  clone  set  comprised 
of  1536  cDNAs  was  hybridized  with  Cy3-labeled  (red)  cDNA  from  androgen-stimulated  LNCaP  cells  and  Cy5  labeled 
(green)  cDNA  from  androgen-starved  LNCaP  cells.  The  expression  ratio  for  each  cDNA  was  determined  and  the  ratios  for 
all  cDNAs  with  signal  Intensities  2.33-fold  above  the  standard  deviation  of  the  background  signal  were  clustered  according 
to  transcript  levels  over  time.  The  Cluster  and  TreeView  software  programs  avaiiabe  at  the  Stanford  genome  web  site  was 
used  for  the  analysis  (http://rana.Stanford.EDU/software/).  Twenty  genes  were  identified  with  increased  expression  after 
androgen  stimulation. 
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3.2  Prostate  gene  expression  analyses: 
cDNA  microarray 

Microarrays  comprised  of  1500  distinct  prostate-derived 
cDNAs  were  hybridized  with  fluorescently  labeled  total 
cDNA  probes  produced  from  androgen-stimulated  and  an¬ 
drogen-starved  LNCaP  prostate  cancer  cells.  No  cDNAs 
were  identified  whose  expression  level  decreased  with 
androgen  stimulation.  In  contrast,  the  hybridization  ratios 
of  20  different  cDNAs  were  consistently  increased  by  >  2- 
fold  in  androgen-stimulated  relative  to  androgen-starved 
cells  (Fig.  2).  This  group  included  cDNAs  encoding  the 
human  glandular  kallikrein  2  (hK2)  and  human  glandular 
kallikrein  3  (hK3),  also  known  as  prostate-specific  antigen 
(PSA).  The  regulation  of  hK2  and  PSA  has  previously 
been  shown  to  be  mediated  by  androgens  through  a 
mechanism  involving  androgen-response  element  (ARE) 
binding  sites  in  the  promoter  regions  of  these  genes  [20, 
21]. 

In  addition  to  hK2  and  PSA,  we  identified  several  other 
genes  previously  shown  to  be  androgen-regulated, 
including  the  prostate  homeobox  gene  NKX3.1  [22],  the 
serine  protease  prostase/PRSS17  [23],  and  two  genes 
involved  in  lipid  metabolism.  The  microarray  analysis  also 
indicated  that  the  expression  of  the  membrane-bound 
serine  protease  TMPRSS2  [24]  was  regulated  by  andro¬ 
gen.  We  subsequently  confirmed  the  androgen  regulation 


of  TMPRSS2  by  Northern  analysis,  identified  a  putative 
ARE  in  the  TMPRSS2  promoter  region,  and  demon¬ 
strated  thatTMPRSS2  is  highly  expressed  in  the  prostate 
gland  relative  to  other  human  tissues  [25].  Several  cDNAs 
corresponding  to  uncharacterizdd  genes  also  exhibited 
transcriptional  regulation  by  androgen  (Fig.  2).  We  have 
cloned  the  full-length  cDNA  and  confirmed  the  androgen 
regulation  of  one  of  these  novel  sequences  and  desig¬ 
nated  it  as  PART-1,  for  Prostate  Androgen-Regulated 
T ranscript-1 ,  as  it  lacks  significant  homology  to  nucleotide 
or  protein  sequences  in  the  non  redundant  subdivision  of 
the  GenBank  and  SWISS-Prot  databases  [26].  Interest¬ 
ingly,  the  tissue  pattern  of  PART-1  expression  is  also 
essentially  restricted  to  the  prostate.  The  cloning  and 
characterization  of  the  other  identified  androgen-regu¬ 
lated  cDNAs  is  in  progress. 

We  anticipate  that  expanding  these  studies  to  include  a 
greater  portion  of  the  prostate  transcriptome  coupled  with 
experiments  designed  to  determine  direct  versus  indirect 
transcriptional  regulation,  and  ultimately  translational  and 
post-translational  regulation  of  these  genes,  will  establish 
a  framework  for  understanding  the  cellular  functions 
mediated  by  androgens.  Despite  the  important  influence 
of  androgenic  hormones  on  prostate  cancer  grow/th,  rela¬ 
tively  few  downstream  targets  of  the  AR  pathway  have 
been  described.  Studies  designed  to  identify  genes  regu- 
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Figure  3;  (Left)  LNCaP  2-DE  protein  expression  profile  with  androgen  stimulation.  (Right)  Three-step  schema  for  protein 
identification  using  MS  and  computer  sequence  database  searching. 
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Figure  4.  Identification  of  an  androgen-regulated  protein  from  metastatic  prostate  cancer  cells  by  2-DE  and  MS.  M12AR 
cells  were  (A)  starved  or  (B)  stimulated  for  24  h  with  the  synthetic  androgen  R1881  and  total  cell  lysates  (40  pg  each)  were 
subjected  to  2-DE.  Protein  expression  profiles  were  compared  and  proteins  demonstrating  a  qualitative  expression  level 
differences  were  subjected  to  in-gel  trypsin  digestion,  and  identified  by  pLC-MS/MS  analysis.  (C),  (D),  MS/MS  spectrum  of 
identified  peptide,  peptide  sequence,  and  identified  ion  series.  (E)  Results  from  correlation  of  acquired  peptide  fragmenta¬ 
tion  spectra  with  database  entries  (using  SEQUEST  software).  The  MS/MS  spectrum  in  (D)  was  identified  as  NDKA_HU- 
MAN  (nm23)  taken  from  the  selected  2-D  gel  spot.  Two  additional  peptides  were  identified  from  this  protein  in  a  single  run. 
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lated  by  androgens  in  the  rat  prostate  determined  that 
androgens  increase  the  transcription  of  about  56  genes 
and  decrease  the  transcription  of  less  than  10  genes  [27]. 
From  a  therapeutic  standpoint,  it  would  be  extremely  use¬ 
ful  to  distinguish  and  subsequently  modulate  the  relevant 
molecules  in  the  AR  program  that  mediate  the  divergent 
processes  of  cellular  proliferation,  cellular  differentiation, 
and  apoptosis. 

3.3  Prostate  gene  expression  analyses: 

2-DE  and  MS 

To  complement  our  prostate  transcriptional  data  and  pro¬ 
vide  a  more  complete  picture  of  prostate  gene  expres¬ 
sion,  we  have  undertaken  a  comprehensive  analysis  of 
that  portion  of  the  prostate  proteome  regulated  by  andro¬ 
genic  hormones.  Reference  protein  expression  profiles 
were  produced  for  the  LNCaP  and  M12AR  prostate  can¬ 
cer  cell  lines  using  2-DE  protein  separation  techniques 
under  steady-state  conditions  (Fig.  3).  Protein  expression 
profiles  from  cell  lysates  under  conditions  of  androgen 
stimulation  and  androgen  starvation  have  also  been  gen¬ 
erated.  A  comparison  of  2-DE  protein  profiles  under  these 
various  conditions  yielded  a  proteomic  signature  charac¬ 
terized  by  a  subset  of  proteins  with  qualitative  and  quanti¬ 
tative  changes.  Individual  proteins  were  Identified  using  a 
sequential  process  of  in-gel  trypsin  digestion  and  extrac¬ 
tion,  peptide  separation  by  pLC,  generation  of  MS/MS 
spectra,  and  database  correlation  with  the  acquired  pep¬ 
tide  fragmentation  pattern  (Fig.  3). 

A  comprehensive  analysis  of  androgen-induced  proteo¬ 
mic  signatures  is  ongoing  and  our  initial  experiments 
demonstrate  the  utility  of  this  approach  in  Identifying  mol¬ 
ecules  of  potential  importance  in  understanding  andro¬ 
gen-mediated  regulation  of  prostate  cancer  progression 
and  metastasis.  Figure  4  depicts  a  portion  of  the  2-DE 
protein  profile  from  androgen-starved  and  androgen- 
stimulated  M12AR  prostate  cancer  cells  with  a  differen¬ 
tially  expressed  protein  spot  that  Is  upregulated  in  M12AR 
cells  after  exposure  to  androgens.  This  protein  was  identi¬ 
fied  as  human  nucleoside  diphosphate  kinase  A  (NDKA/ 
nm23),  a  well-characterized  gene  with  tumor  metastasis 
suppressor  activity  in  several  different  human  tumors 
including  melanoma,  breast,  ovary  and  prostate  [28,  29]. 
Transfection  of  the  DU-145  prostate  cancer  cell  line  with 
NDKA/nm23  Inhibited  the  adhesion  to  cell  matrix  and 
impaired  colony  growth  in  soft  agar  [29]. 

The  Ml  2  prostate  cancer  cell  line  is  highly  tumorigenic 
when  implanted  into  nude  mice  and  metastasizes  to  dif¬ 
ferent  anatomical  sites.  Transfection  of  these  cells  with  a 
functional  androgen  receptor  (M12AR)  markedly  de¬ 
creases  the  proliferation  rate,  tumor  growth,  invasive- 
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ness,  and  in  vivo  metastatic  potential  when  these  cells 
are  injected  into  the  prostate  glands  of  nude  mice  (S.  Ply- 
mate,  unpublished  observation).  NDKA/nm23  transcripts 
have  been  shown  to  increase  rapidly  in  prostate  cancer 
cell  lines  after  the  administration  of  androgens,  though  no 
functional  ramifications  of  this  increased  expression  were 
described  [30]. 

A  possible  mechanism  for  the  decreased  tumorigenic  and 
metastatic  capability  of  M12AR  cells  compared  with  M12 
cells  lacking  the  AR  involves  the  upregulation  of  NDKA/ 
nm23  by  androgens  through  a  functional  androgen-re¬ 
sponse  program  restored  by  the  AR  transfection  and 
expression.  Such  an  observation  has  direct  clinical  rele¬ 
vance.  Both  human  and  in  vitro  studies  suggest  that  there 
may  be  a  survival  benefit  from  maintaining  an  androgen 
responsive  cohort  of  prostate  tumor  cells  [31-33].  This 
concept  has  been  studied  in  the  LNCaP  cell  system  by 
comparing  the  rate  of  tumor  growth  in  castrated  mice 
implanted  with  LNCaP  cells  with  subsequent  tumor 
growth  (i)  without  further  therapy,  or  (ii)  followed  by  inter¬ 
mittent  androgen  replacement.  The  rate  of  tumor  growth 
as  measured  by  serum  PSA  was  slower  In  animals 
treated  with  intermittent  androgen  supplementation  com¬ 
pared  to  those  maintained  in  the  castrated  state  [31]. 

4  Concluding  remarks 

The  results  presented  here  demonstrate  the  utility  of 
global  expression  studies  to  simultaneously  identify  multi¬ 
ple  genes  and  gene  products  of  biological  relevance  that 
participate  in  specific  metabolic  pathways.  Both  known 
and  unknown  genes  are  rapidly  identified.  Notable  advan¬ 
tages  of  the  mIcroarray-based  transcript  profiling  ap¬ 
proach  Include  the  ability  to  perform  detailed  time-course 
or  variable  drug-dose  experiments  in  a  robust  economical 
fashion.  Controlled  replicate  experiments  can  determine 
system  and  procedural  errors.  However,  this  approach  is 
absolutely  dependent  upon  the  identification  of  diverse 
clone  sets  for  array  construction  that  are  biologically  rele¬ 
vant  to  the  system  under  study.  In  addition,  a  significant 
limitation  of  transcript  profiling  methods  is  the  lack  of  a 
tight  correlation  between  gene  activity  as  measured  by 
mRNA  level,  and  protein  abundance  [34].  Global  protein 
analyses  focus  on  the  actual  biological  effector  mole¬ 
cules,  but  are  restricted  by  difficulties  in  detecting  low 
abundance  proteins,  accurately  measuring  the  differ¬ 
ences  in  protein  levels  between  two  samples,  and  a 
dependency  on  comprehensive  annotated  sequence 
databases  for  protein  Identification. 

Integrating  the  assembly  and  annotation  of  sequence 
databases  with  transcript  profiling  and  proteome  analyses 
combines  complementary  robust  approaches  that  capital- 
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ize  on  the  strengths  and  avoid  the  limitations  of  relying  on 
one  method.  The  further  expansion  of  this  work  to  include 
the  analysis  of  the  entire  prostate  transcriptome  coupled 
with  quantitative  proteome  studies  should  enable  the 
characterization  of  gene  networks  and  cellular  pathways 
that  can  be  exploited  for  therapeutic  intervention. 

This  work  was  supported  in  part  by  the  CaPCURE  Foun¬ 
dation,  and  a  grant  (CA75173)  from  the  National  Cancer 
Institute,  grants  (DAMD  17-98-1-8499  and  PC991274) 
from  the  Department  of  Defense  to  PSN,  a  grant  (R01- 
DK52683)  and  Veterans  Affairs  Merit  Review  Program  to 
SRP,  the  Science  Technology  Center  for  Molecular  Bio¬ 
technology,  and  a  grant  (NIH  CRR1 1823)  to  RA.  We  wish 
to  thank  Joy  Ware  for  the  Ml 2  prostate  cancer  cell  line, 
Steve  Lasky  and  the  UW  Molecular  Biotechnology 
sequencing  center  for  DNA  sequencing  support,  and 
Roger  Bumgarner,  Nigel  Clegg,  Victoria  Hawkins,  and 
Burak  Eroglu  for  bioinformatics  support. 

Received  December  8, 1999 


5  References 

[1]  Denmeade,  S.  R.,  Lin,  X.  S.,  Isaacs,  J.  T.,  Prostate  1996, 
25,251-265. 

[2]  Isaacs,  J.  T.,  Wake,  N.,  Coffey,  D.  S.,  Sandberg,  A.  A.,  Can¬ 
cer  Res.  1982,  42,  2353-2371. 

[3]  Bruchovsky,  N.,  Brown,  E.  M.,  Coppin,  C.  M.,  Goldenberg, 
S,  L.,  Le  Riche,  J.  C.,  Murray,  N.  C.,  Rennie,  P.  S.,  Progr. 
Clin.  Biol.  Res.  1987, 239,  347-387. 

[4]  Velculescu,  V.  E„  Zhang,  L.,  Zhou,  W.,  Vogelstein,  J.,  Bas¬ 
ra!,  M.  A.,  Bassett,  D.  E.,  Hieter,  P.,  Vogelstein,  B„  Kinzier, 
K.  W.,  Ce//1997,  88,  243-251. 

[5]  Hawkins,  V.,  Doll,  D.,  Bumgarner,  R.,  Smith,  T.,  Abajian,  C., 
Hood,  L,  Nelson,  P.  S.,  Nucleic  Acids  Res.  1999,  27, 
204-208. 

[6]  Nelson,  P.  S.,  Ng,  W.  L.,  Schummer,  M.,  True,  L.  D.,  Liu,  A. 
Y.,  Bumgarner,  R.  E.,  Ferguson,  C.,  Dimak,  A.,  Hood,  L, 
Genom/cs  1 998,  47, 12-25. 

[7]  Strausberg,  R.  L.,  Dahl,  C.  A.,  Klausner,  R.  D.,  Nature 
Genet.  ^997,  75,415-416. 

[8]  Huang,  X.,  Genomics  1996,  33, 21-31. 

[9]  Webber,  M.  M.,  Bello,  D.,  Quader,  S.,  Prostate  1997,  30, 
58-64, 

[10]  Bae,  V.  L.,  Jackson-Cook,  C.  K.,  Maygarden,  S.  J.,  Plymate, 
S.  R.,  Chen,  J.,  Ware,  J.  L,  Prostate  1998,  34, 275-282. 

[11]  Bae,  V.  L,  Jackson-Cook,  C.  K.,  Brothman,  A.  R.,  Maygar¬ 
den,  S.  J.,  Ware,  J.  L,  Int.  J.  Car?cer1994,  55, 721-729. 

[12]  Garrels,  J.  L,  Franza  Jr.,  B.  R.,  J.  Biol.  Chem.  1989,  264, 
5283-5298. 

[13]  Blum,  H.,  Beier,  H.,  Gross,  H.,  Electrophoresis  ^9B7,  8, 
93-99. 


Comprehensive  analyses  of  prostate  gene  expression  1 831 

[14]  Corthals,  LG.,  Gygi,  S.  P.,  Aebersold,  R.,  Patterson,  S.  D., 
in:  Rabilloud,  T.  (Ed,),  Proteome  Research:  2D  Gel  Electro¬ 
phoresis  and  Detection  Methods,  Springer,  New  York  1999, 
pp.  197-232, 

[15]  Yates  III,  J.  R.,  Eng,  J.  K.,  McCormack,  A.  L.,  Schieltz,  D., 
Anal.  Chem.  1995,  67, 1425-1436. 

[16]  Hastle,  N.  D.,  Bishop,  J.  O.,  Ce//1976,  9,  761-774. 

[17]  Bishop,  J,  O.,  Morton,  J.  G.,  Rosbash,  M„  Richardson,  M., 
Nature  1 974, 250, 1 99-204. 

[18]  Zhang,  L,  Zhou,  W.,  Velculescu,  V.  E.,  Kern,  S.  E.,  Hruban, 

R.  H.,  Hamilton,  S.  R.,  Vogelstein,  B.,  Kinzier,  K.  W.,  Sci¬ 
ence  1 997,  275, 1268-1272. 

[19]  Nelson,  P.  S.,  Hawkins,  V.,  Schummer,  M.,  Bumgarner,  R., 
Ng,  W.  L,  Ideker,  T.,  Ferguson,  C.,  Hood,  L.,  Genet.  Anal. 
1999,  75,209-215. 

[20]  Riegman,  P.  H.,  Vlietstra,  R.  J.,  van  der  Korput,  J.  A.,  Brink- 
mann,  A.  O.,  Trapman,  J.,  Mol.  Endocrinol.  1991,  5, 
1921-1930. 

[21]  Murtha,  P.,  Tindall,  D.  J.,  Young,  C.  Y.,  Biochemistry\993, 
32,  6459-6464. 

[22]  He,  W.  W.,  Sciavolino,  P.  J.,  Wing,  J.,  Augustus,  M.,  Hud¬ 
son,  P.,  Meissner,  P.  S.,  Curtis,  R.  T.,  Shell,  B.  K.,  Bostwick, 
D.  G.,  Tindall,  D.  J.,  Gelmann,  E.  P.,  Abate-Shen,  C.,  Car¬ 
ter,  K.  C.,  Genomics  1 997,  43, 69-77. 

[23]  Nelson,  P.  S.,  Gan,  L.,  Ferguson,  C.,  Moss,  P.,  Gelinas,  R., 
Hood,  L..  Wang,  K.,  Proc.  Natl.  Acad.  Sci.  USA  1999,  96, 
3114-3119. 

[24]  Paoloni-Giacobino,  A.,  Chen,  H.,  Peltsch,  M.  C.,  Rossier, 
C.,  Antonarakis,  S.  E.,  Ge/?o/n/cs1997, 44, 309-320. 

[25]  Lin,  B.,  Ferguson,  C,,  White,  J.  T.,  Wang,  S.,  Vessella,  R., 
True,  L.  D.,  Hood,  L.,  Nelson,  P.  S.,  Cancer  Res.  1999,  59, 
4180-4184. 

[26]  Lin,  B.,  White,  J.  T.,  Ferguson,  C.,  Bumgarner,  R.,  Fried¬ 
man,  C.,  Trask,  B.,  Ellis,  W.,  Lange,  P.,  Hood,  L,  Nelson,  P. 

S. ,  Cancer  Res.,  in  press. 

[27]  Wang,  Z.,  Tufts,  R.,  Haleem,  R.,  Cai,  X.,  Proc.  Natl.  Acad. 
Sci.  USA  1997,  94, 1299-3004. 

[28]  Freije,  J.  M.,  MacDonald,  N.  J.,  Steeg,  P.  S.,  Biochem.  Soc. 
Symp.  1998,  53,261-271. 

[29]  Lim,  S.,  Lee,  H.  Y.,  Lee,  H.,  Cancer  Lett.  1998,  133, 
143-149. 

[30]  Yoshimura,  I.,  Wu.  J.  M„  Chen,  Y.,  Ng.  C.,  Mallouh,  C., 
Backer,  J.  M.,  Mendola,  C.  E.,  Tazaki,  H.,  Biochem.  Bio- 
phys.  Res.  Commun.  1995, 208,  603-809. 

[31]  Sato,  N.,  G leave,  M.  E.,  Bruchovsky,  N.,  Rennie,  P.  S., 
Goldenberg,  S.  L.,  Lange,  P.  H.,  Sullivan,  L  D,.  J.  Steroid 
Biochem.  Mol.  Biol.  1996,  55, 139-146. 

[32]  Grossfeld,  G.  D.,  Small,  E.  J.,  Carroll,  P.  R,,  Uro/ogy  1998, 
57, 137-144. 

[33]  Oliver,  R.  T.,  Williams,  G..  Paris.  A.  M.,  Blandy,  J.  P.,  Urol¬ 
ogy  ^997,  49,79-82. 


19 


ELSEVIER 


SEQUENCE  DATABASES  AND  MICROARRAYS  AS  TOOLS 
FOR  IDENTIFYING  PROSTATE  CANCER  BIOMARKERS 

LYNETTE  H.  GROUSE,  PETER  J.  MUNSON,  and  PETER  S.  NELSON 


ABSTRACT 

Identification,  acquisition,  and  assessment  of  molecular  markers  that  could  be  adopted  as  surrogate 
endpoints  for  evaluating  a  response  to  prostate  cancer  intervention  strategies  is  highly  desirable.  Recent 
advances  in  the  fields  of  genomics  and  biotechnology  have  dramatically  increased  the  quantity  and  acces¬ 
sibility  of  molecular  information  that  is  relevant  to  the  study  of  prostate  carcinogenesis.  One  major  advance 
involves  the  construction  of  comprehensive  databases  that  archive  gene  sequences  and  gene  expression 
data.  This  information  is  in  a  format  suitable  for  virtual  queries  designed  to  distinguish  the  molecular 
differences  between  normal  and  cancer  cells.  A  second  major  advance  uses  robotic  tools  to  construct 
microarrays  comprising  thousands  of  distinct  genes  expressed  in  prostate  tissues.  Such  arrays  offer  a 
powerful  approach  for  monitoring  the  expression  of  thousands  of  genes  simultaneously  and  provide  access 
for  techniques  designed  to  assess  patterns  or  “fingerprints”  of  gene  expression  that  may  ultimately  be  used 
as  signatures  of  response  to  therapeutic  Intervention.  UROLOGY  57  (Suppl  4A):  1 54-1 59,  2001 .  ©  2001 , 
Elsevier  Science  Inc. 


The  human  genome  is  estimated  to  comprise 
approximately  30,000  to  100,000  genes.  To 
confer  developmental  and  functional  specificity, 
only  a  fraction  of  this  total  is  active  in  a  given  cell 
type  at  a  given  time,  and  these  expressed  genes 
essentially  define  the  state  of  that  cell.  The  molec¬ 
ular  profile  of  normal  and  cancer  cells,  ie,  their  set 
of  expressed  genes,  differs  in  both  qualitative  (al¬ 
ternative  forms  of  a  gene)  and  quantitative  fash¬ 
ions.  Measurement  of  this  profile  may  predict  the 
phenotypic  behavior  of  such  cells  more  accurately 
than  traditional  histologic  approaches. 

To  identify  informative  biomarkers  and  suitable 
intermediate  endpoints  of  disease,  it  would  be  ad¬ 
vantageous  to  have  a  catalog  or  index  of  all  genes 
and  their  cognate  proteins  that  are  expressed  in 
normal  and  neoplastic  prostate  tissues.  This  re¬ 
source  could  then  be  rapidly  exploited  to  identify 
candidate  biomarkers  for  evaluation  based  on  ho- 
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mology  to  known  genes  of  importance  in  prostate 
cancer,  gene  polymorphisms  and  mutations,  or  al¬ 
terations  in  gene  expression.  This  review  will  focus 
particularly  on  the  use  of  tissue-specific  expressed 
sequence  tag  (EST)  databases,  the  development 
and  use  of  cDN  A  microarrays,  and  statistical  issues 
related  to  microarray  analyses.  These  approaches 
may  become  essential  for  identifying  new  biomar¬ 
kers  in  prostate  cancer, 

DATABASES  AS  TOOLS  FOR  BIOMARKER 
IDENTIFICATION 

In  1997,  the  National  Cancer  Institute  an¬ 
nounced  a  bold  new  initiative,  the  Cancer  Genome 
Anatomy  Project  (CGAP),  with  the  overall  goal  of 
achieving  the  comprehensive  molecular  character¬ 
ization  of  normal,  precancerous,  and  cancerous 
cells. The  CGAP  is  an  interdisciplinary  program 
that  uses  National  Institutes  of  Health  intramural 
research  teams,  academic  centers,  and  commercial 
resources  to  establish  an  index  of  genes  expressed 
in  tumors.  The  CGAP  serves  as  an  interface  be¬ 
tween  genomics  and  cancer  research.  The  new 
technologies  supported  by  this  initiative,  and  the 
products  resulting  from  these  technologies,  will  be 
accessible  to  the  public  through  an  Internet  web¬ 
site  (http://www.cgap.nci.nih.gov).  This  Internet 
site  provides  information  about  cDNA  libraries  of 
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FIGURE  1.  Gene  expression  profiles  in  normal,  precancerous,  and  malignant  prostate  tissue.  Differences  in  gene 
expression  between  cDNA  libraries  prepared  from  vanous  types  of  prostate  tissues  can  be  analyzed  by  using  the 
Digital  Differential  Display  (DDD)  software  program.  Library  A  is  prepared  from  normal  prostate  epithelium,  Library 
B  from  precancerous  prostate  tissue,  Library  C  from  malignant  prostate  cancer,  and  Library  D  is  from  a  control 
library  prepared  from  a  pool  of  brain,  liver,  and  spleen  tissue.  The  Gene  index  contains  the  UniGene  Cluster  Identifier, 
and  Gene  Description  lists  the  gene  name.  In  each  box,  the  number  at  the  top  represents  the  fraction  of  sequences 
in  that  cDNA  library  that  expresses  the  gene  or  EST.  The  dot  is  a  visual  aid,  which  reflects  the  numerical  values.  Each 
library  is  compared  with  each  of  the  other  libraries  in  pairwise  analysis.  If  the  difference  in  gene  expression  between 
two  libraries  is  statistically  significant,  it  is  indicated  by  a  greater  than  or  less  than  symbol. 


normal  and  cancerous  tissue,  description  of  the 
methods  used  in  preparing  each  library,  and  infor¬ 
matics  tools  to  perform  analyses  of  gene  expression 
using  cDNA  library  data. 

A  goal  of  the  CGAP  is  to  facilitate  the  identifica¬ 
tion  of  possible  molecular  biomarkers  for  various 
types  of  cancer.  To  enable  investigators  to  analyze 
molecular  databases  that  are  very  large  and  com¬ 
plex,  CGAP  has  developed  software  tools  in  collab¬ 
oration  with  the  National  Center  for  Biotechnology 
Information  at  the  National  Institutes  of  Health. 
These  software  tools  aid  in  the  analysis  and  com¬ 
parison  of  gene  expression  in  a  variety  of  tissues 
and  stages  of  cancer.  All  of  these  tools  are  available 
on  the  CGAP  Internet  website. 

An  example  of  a  software  analysis  tool  is  Digital 
Differential  Display  (DDD).'^  DDD  is  used  to  com¬ 
pare  sequence-based  gene  expression  profiles 
among  individual  cDNA  libraries  or  pools  of  librar¬ 
ies  from  the  same  or  different  tissues.  Analysis  of 
different  gene  expression  profiles  may  identify 
genes  that  contribute  to  a  cell’s  unique  character¬ 
istics.  Such  genes,  when  expressed  at  different  lev¬ 
els  in  normal  and  cancer  cells,  may  be  considered 
as  candidate  biomarkers  for  use  in  cancer  screen- 
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ing.  DDD  uses  a  statistical  comparison  of  genes 
expressed  in  each  cDNA  library  to  determine 
which  differences  are  statistically  significant.  The 
statistical  analysis  is  based  on  the  Fisher  exact 
test.5  Differences  in  gene  expression  values  are  pre¬ 
sented  both  visually  and  numerically. 

An  example  of  a  DDD  analysis  of  three  cDNA 
libraries  made  from  prostate  tissue  is  shown  in  Fig¬ 
ure  1 .  Row  1  shows  an  expression  profile  of  a  gene 
that  has  a  knowm  function,  whereas  rows  2  to  5 
show  expression  differences  between  genes  of  un¬ 
known  function,  referred  to  as  ESTs.  Row  1,  col¬ 
umn  D,  shows  that  all  three  prostate  cDNA  librar¬ 
ies  have  increased  expression  of  the  DIOl  gene 
compared  with  that  of  control.  Within  the  prostate 
libraries,  column  A  shows  increased  gene  expres¬ 
sion  compared  with  that  of  the  precancerous  li¬ 
brary  in  column  B,  but  was  not  shown  to  be  statis¬ 
tically  significantly  increased  when  compared  with 
malignant  prostate  cancer  tissue  libraries.  The 
power  of  this  analysis  is  in  identification  of  possi¬ 
ble  biomarkers  within  anonymous  EST  sequences. 
In  row  2,  the  expression  of  this  EST  is  increased 
over  control  in  only  the  precancerous  prostate 
cDNA  library,  whereas  the  EST  in  rows  3  and  4  is 

155 


UROLOGY  57  (Supplement  4A),  April  2001 


increased  only  in  cancerous  prostate  tissues.  These 
genes  could  be  evaluated  as  candidate  biomarkers 
to  identify  prostate  cancer  disease  progression.  The 
Prostate  Expression  Database  (PEDB)  (http://www. 
mbt.washington.edu/PEDB)^  is  another  online  re¬ 
source  of  prostate  genetic  information.  The  PEDB 
is  a  curated  relational  database  and  suite  of  analysis 
tools  designed  specifically  for  the  study  of  prostate 
gene  expression  in  normal  and  diseased  states. 
The  ESTs,  derived  from  more  than  40  human 
prostate  cDNA  libraries,  are  assembled  into  dis¬ 
tinct  species  groups  that  are  annotated  with  in¬ 
formation  from  the  GenBank,  dbEST,  and  Uni¬ 
gene  public  sequence  databases.  The  expression 
pattern  of  each  gene  can  be  viewed  across  all 
libraries  or  tissues  using  the  Virtual  Expression 
Analysis  Tool  (VEAT),  a  graphical  user  interface 
written  in  Java  for  intra-  and  interlibrary  gene 
expression  analyses. 

cDNA  EXPRESSION  ARRAYS  FOR 
BIOMARKER  IDENTIFICATION 

The  inherent  heterogeneity  of  prostate  cancers 
and  the  diversity  of  therapeutic  interventions  sug¬ 
gest  that  it  is  unlikely  that  a  single  biomarker  or 
intermediate  endpoint  that  will  provide  sufficient 
sensitivity  or  specificity  for  assessing  a  treatment 
response  can  be  identified.  Efforts  have  been  di¬ 
rected  toward  methods  of  simultaneously  measur¬ 
ing  multiple  biomarkers  at  the  DNA,  RNA,  or  pro¬ 
tein  levels.  Such  a  multiplexed  approach  will 
greatly  expand  the  information  gained  from  each 
patient  sample  and  clinical  trial.  In  addition,  pat¬ 
terns  in  biomarker  data  may  be  identified  that  to¬ 
gether  exceed  the  sum  of  individual  measure¬ 
ments. 

Recent  developments  in  informatics,  miniatur¬ 
ization,  and  robotics  have  provided  new  extremely 
powerful  approaches  for  comprehensive  measure¬ 
ments  of  genetic  alterations  that  occur  in  neopla¬ 
sia.  These  measured  alterations  could  also  reflect  a 
response  (or  lack  of  response)  to  a  chemopreven- 
tive  or  therapeutic  agent.  One  such  comprehensive 
approach  involves  the  use  of  DNA  arrays,  a  tech¬ 
nique  that  combines  the  proven  chemistry  of  nu¬ 
cleic  acid  hybridization  with  advanced  automation 
and  imaging  technology  to  quantitatively  detect 
changes  in  the  expression  levels  of  thousands  of 
genes  simultaneously.  DNA  arrays  have  been  as¬ 
sembled  in  several  configurations,  including  oligo¬ 
nucleotide  arrays,^  microarrays  of  cDNA  spotted 
on  glass  slides,®  and  DNA  spotted  onto  nylon 
membranes.^  The  basic  method  is  straightforward: 
DNA  representing  a  particular  gene  of  interest  is 
either  spotted  (printed)  or  synthesized  onto  a  solid 
support,  such  as  a  glass  microscope  slide,  silicon 
wafer,  or  nylon  membrane  (Figure  2).  The  proce¬ 


dure  is  repeated  in  an  automated  fashion  with 
thousands  of  different  genes,  such  that  each  is  de¬ 
posited  in  a  precise  spatial  location  that  allows  for 
the  subsequent  identification  of  any  individual 
spot.  Probes  representing  the  expressed  genetic  in¬ 
formation  in  a  tissue  sample  are  labeled  with  radio¬ 
active  or  fluorescent  markers  that  can  be  quantified 
by  sensitive  detectors  and  used  for  comparative 
analyses.  A  limitation  on  the  number  of  individual 
elements  that  can  be  placed  on  the  area  of  a  given 
‘"chip”  array  places  a  premium  on  efficient  con¬ 
struction.  This  is  accomplished  by  eliminating  re¬ 
dundancy  (maximizing  diversity),  and  incorporat¬ 
ing  DNA  sequences  that  are  relevant  for  the 
biological  system  under  study. 

Gene  expression  catalogs,  such  as  the  CGAP  and 
the  PEDB,®’^®  can  be  exploited  for  the  construction 
and  analysis  of  cDNA  expression  arrays  by  provid¬ 
ing  a  virtual  archive  of  thousands  of  genes  ex¬ 
pressed  in  prostate  tissue.  Coupling  this  virtual  re¬ 
pository  with  the  physical  clones  representing  the 
corresponding  DNA  molecules  allows  for  the  con¬ 
struction  of  comprehensive  arrays.  The  continued 
expansion  of  this  resource  to  encompass  all  pros¬ 
tate  transcripts  will  allow  for  the  simultaneous 
analysis  of  all  genes  expressed  in  normal  and  neo¬ 
plastic  prostate  cells.  This  effort  will  require  exten¬ 
sive  testing  on  prostate  tumors  and  a  further  refine¬ 
ment  of  the  methods  to  include  statistical  measures 
of  biological  and  experimental  variance. 

STATISTICAL  ISSUES  IN  THE  ANALYSIS  OF 
cDNA  MICROARRAYS 

Special-purpose,  tissue-specific  cDNA  microar¬ 
rays  can  now  be  routinely  generated  using  com¬ 
mercially  available  spotting  robots,  either  using 
glass-based  or  nylon-based  substrate.  A  growing 
number  of  commercial  cDNA  microarrays  are  also 
available,  giving  smaller  labs  the  opportunity  to 
use  this  technology.  Careful  attention  to  the  design 
and  statistical  analysis  of  each  experiment  is  essen¬ 
tial,  especially  given  the  high  cost  of  microarrays. 
As  with  any  assay  procedure,  microarray  data  are 
subject  to  three  major  sources  of  random  and  sys¬ 
tematic  error:  reagent  quality,  sample  preparation, 
and  laboratory  technique.  The  most  important  re¬ 
agent  is  the  microarray  itself,  which  may  be  subject 
to  significant  batch-to-batch  variability.  The  array 
may  include  clones  of  questionable  quality,  possi¬ 
bly  including  troublesome  repetitive  DNA  or  an¬ 
other  contaminating  sequence.  Variability  of  the 
substrate,  either  nylon  or  glass,  can  have  a  marked 
effect  on  the  uniformity  of  the  array  image.  Sample 
preparation  includes  all  tissue  handling,  cell  isola¬ 
tion,  RNA  extraction,  and  labeling  steps.  Unin¬ 
tended  variation  in  RNA  content  may  easily  result 
from  poor  temperature  control,  heat  shock,  degra- 
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FIGURE  2.  cDNA  microarray  cofistruction  and  analysis,  Microorray  assays  are  performed  in  a  multistep  process. 
First,  microarrays  are  prepared  by  assembling  sets  of  cDNA  clones  in  96-  or  384-well  microtiter  plates.  Small 
tweezer  tips  or  needles  attached  to  a  robotic  arm  are  used  to  withdraw  small  amounts  of  the  DNA  solution  from  the 
microtiter  plates  and  print  them  onto  glass  microscope  slides  in  a  precise  spatial  orientation  with  high  replicative 
fidelity.  cDNA  probes  are  prepared  from  two  distinct  tissue  sources  (eg,  normal  tissue  and  neoplastic  tissue)  by  first 
extracting  RNA  followed  by  a  conversion  step  to  cDNA  that  incorporates  a  different  fluorescent  dye  Into  the  different 
tissue  source  cDNA  (eg,  green  for  normal  and  red  for  neoplastic).  These  labeled  cDNA  probes  ore  then  combined  and 
hybridized  to  the  microorray  such  that  cDNAs  in  the  probe  will  attach  to  their  complementary  cDNA  spot  on  the 
microarray  surface.  Nonhybridizing  cDNAs  are  removed  by  a  washing  step,  and  the  remaining  bound  cDNA 
molecules  are  quantitated  by  measuring  the  fluorescent  intensity  at  every  spot  location.  Array  analyses  determine 
the  ratio  of  intensities  at  each  spot  and  thus  identify  specific  genes  that  ore  overexpressed  in  normal  tissue  relative 
to  neoplastic  (green  spot),  overexpressed  in  neoplastic  relative  to  normal  (red  spot),  or  expressed  at  equivalent 
levels  (yellow  spot). 


dation,  sample  handling,  etc.  Fluorescence  or  ra¬ 
dioactive  label  incorporation  may  also  be  subject  to 
variation  and  can  strongly  influence  the  results. 
During  the  hybridization  of  labeled  probe  to  the  tar¬ 
get  cDNA  on  the  array,  carefully  controlled  time, 
temperature,  and  agitation  conditions  should  prevail. 
Issues  of  saturation  and  dynamic  range  compression 
may  arise  during  image  acquisition  and  storage. 

By  far  the  most  straightforward  way  to  address 
each  of  these  issues  is  by  use  of  independent  repli¬ 
cated  experiments.  Apparent  gene  expression 
changes  that  persist  through  such  repeated  exper¬ 
iments  can  correctly  be  ascribed  to  interesting  bi¬ 
ological  changes  rather  than  artifacts  of  the  assay 
itself.  The  following  illustrative  analysis  of  dupli¬ 
cate  experiments  easily  screens  out  many  artifac- 
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tual  expression  changes.  We  compared  a  mela¬ 
noma  cell  line  to  a  prostate  tumor  cell  line  for 
expression  differences  on  a  prostate-specific,  ny¬ 
lon-based  cDNA  array.  Spot  intensities  were 
quantified  using  the  P-SCAN  software  (available  at 
http://abs.cit.nih.gov/PSCAN).  The  intensities  of 
each  spot  were  compared  in  Figure  3A,  which  at 
first  seems  to  indicate  that  a  large  number  of  genes 
have  greater  than  fourfold  changes  in  relative  ex¬ 
pression  levels  between  the  two  cell  types.  Analysis 
of  a  duplicate  experiment  gave  a  different  picture. 
Figure  3B  shows  that  a  much  smaller  number  of 
genes  undergo  greater  than  fourfold  changes  con¬ 
sistently  in  both  experiments,  meaning  that  many 
of  the  apparent  fourfold  changes  in  the  first  exper¬ 
iment  were  “false-positives.”  A  family  of  differen- 
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FIGURE  3.  Comparison  of  melanoma  (M)  expression  levels  to  that  from  a  prostate  tumor  cell  line  (P]J^  (A) 
Normalized  Intensities  show  more  than  148  genes  with  apparent  expression  change  over  four- fold  (up,  squares  or 
down,  triangles).  (B)  Expression  ratios  are  compared  for  duplicate  experiments  (PI /Ml,  P2/M2).  Only  3 1  genes  are 
consistently  over-  or  underexpressed  by  greater  than  fourfold  In  both  experiments. 
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FIGURE  4.  Molecular  profiles  os  surrogate  endpoint  blomarkers.  Patients  with  localized  prostate  cancer  are 
treated  with  a  chemopreventive  (CP)  agent.  A  biopsy  Is  performed  and  subjected  to  molecular  profiling  by  cDNA 
mlcroarray  analysis.  The  pattern  of  expression  is  compared  with  reference  patterns  previously  shown  to  correlate 
with  a  tumor  response  or  lack  of  tumor  response  to  the  CP  agent.  These  data  are  used  to  guide  further  therapeutic 
Intervention. 


tially  expressed  genes  also  clearly  emerged  and  was  APPLICATION  TO  CHEMOPREVENTION 
later  confirmed  by  Northern  blot  analysis.  Reduc¬ 
tion  in  the  number  of  false-positives  can  be  essen-  Among  their  many  applications,  database  and  ar- 

tial  when  using  microarray  technology  to  look  for  ray-based  methods  of  genetic  analysis  can  be  useful 

new  cancer  markers,  as  tens  of  thousands  of  clones  for  the  identification,  acquisition,  and  assessment 
must  be  screened.  of  candidate  molecular  markers  that  could  be 
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adopted  as  surrogate  endpoints  for  assessing  pre¬ 
ventive  strategies  (chemoprevention  or  nutritional 
intervention).  One  scenario  involves  a  cohort  of 
patients  diagnosed  with  low-  or  intermediate- 
grade  prostate  cancers  by  needle  biopsy.  Patients 
who  elect  to  forgo  primary  therapy  (radical  prosta¬ 
tectomy  or  radiotherapy)  could  be  offered  a  che- 
mopreventive  agent  aimed  at  halting  cancer  pro¬ 
gression.  Gene  expression  profiles  of  tumor  tissue 
before  and  after  the  chemopreventive  agent  would 
be  assessed  for  expression  patterns  correlating 
with  a  propensity  for  the  cancer  to  progress,  indi¬ 
cating  that  a  primary  therapy  should  be  offered,  or 
for  the  cancer  to  respond  to  the  chemopreventive 
agent  and  thus  require  no  further  intervention 
(Figure  4) .  The  development  of  this  type  of  assay  is 
clearly  desirable,  but  defining  predictive  patterns 
of  expression  is  not  a  trivial  task. 
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